2 research outputs found

    Longitudinal study of first-time freshmen using data mining

    Get PDF
    In the modern world, higher education is transitioning from enrollment mode to recruitment mode. This shift paved the way for institutional research and policy making from historical data perspective. More and more universities in the U.S. are implementing and using enterprise resource planning (ERP) systems, which collect vast amounts of data. Although few researchers have used data mining for performance, graduation rates, and persistence prediction, research is sparse in this area, and it lacks the rigorous development and evaluation of data mining models. The primary objective of this research was to build and analyze data mining models using historical data to find out patterns and rules that classified students who were likely to drop-out and students who were likely to persist.;Student retention is a major problem for higher education institutions, and predictive models developed using traditional quantitative methods do not produce results with high accuracy, because of massive amounts of data, correlation between attributes, missing values, and non-linearity of variables; however, data mining techniques work well with these conditions. In this study, various data mining models were used along with discretization, feature subset selection, and cross-validation; the results were not only analyzed using the probability of detection and probability of false alarm, but were also analyzed using variances obtained in these performance measures. Attributes were grouped together based on the current hypotheses in the literature. Using the results of feature subset selectors and treatment learners, attributes that contributed the most toward a student\u27s decision of dropping out or staying were found, and specific rules were found that characterized a successful student. The performance measures obtained in this study were significantly better than previously reported in the literature

    Models for calculating confidence intervals for neural networks

    Get PDF
    This research focused on coding and analyzing existing models to calculate confidence intervals on the results of neural networks. The three techniques for determining confidence intervals determination were the non-linear regression, the bootstrapping estimation, and the maximum likelihood estimation. Confidence intervals for non-linear regression, bootstrap estimation, and maximum likelihood were coded in Visual Basic. The neural network used the backpropagation algorithm with an input layer, one hidden layer and an output layer with one unit. The hidden layer had a logistic or binary sigmoidal activation function and the output layer had a linear activation function. These techniques were tested on various data sets with and without additional noise. Out of eight cases studied, non-linear regression and bootstrapping each had the four lowest values for the average coverage probability minus the nominal probability. For the average coverage probabilities minus the nominal probabilities of all data sets, the bootstrapping estimation obtained the lowest values. The ranges and standard deviations of the coverage probabilities over 15 simulations for the three techniques were computed, and it was observed that the non-linear regression obtained consistent results with the least range and standard deviation, and bootstrapping had the largest ranges and standard deviations. The bootstrapping estimation technique gave a slightly better average coverage probability (CP) minus nominal values than the non-linear regression method, but it had considerably more variation in individual simulations. The maximum likelihood estimation had the poorest results with respect to the average CP minus nominal values
    corecore